Bayesian estimation of rainfall dispersion in Thailand using gamma distribution with excess zeros

The gamma distribution is commonly used to model environmental data. However, rainfall data often contain zero observations, which violates the assumption that all observations must be positive in a gamma distribution, and so a gamma model with excess zeros treated as a binary random variable is required. Rainfall dispersion is important and interesting, the confidence intervals for the variance of a gamma distribution with excess zeros help to examine rainfall intensity, which may be high or low risk. Herein, we propose confidence intervals for the variance of a gamma distribution with excess zeros by using fiducial quantities and parametric bootstrapping, as well as Bayesian credible intervals and highest posterior density intervals based on the Jeffreys’, uniform, or normal-gamma-beta prior. The performances of the proposed confidence interval were evaluated by establishing their coverage probabilities and average lengths via Monte Carlo simulations. The fiducial quantity confidence interval performed the best for a small probability of the sample containing zero observations (δ) whereas the Bayesian credible interval based on the normal-gamma-beta prior performed the best for large δ. Rainfall data from the Kiew Lom Dam in Lampang province, Thailand, are used to illustrate the efficacies of the proposed methods in practice.


INTRODUCTION
Thailand is a mainly agrarian country, with the largest agricultural area being in the north of the country due to its cooler climate making it the best place for cultivation. Rainfall is an important factor for cultivation. The rainy season begins in mid-May and ends in mid-October, the southwest monsoon predominate over Thailand to bring abundant annual rainfall. August to September is the wettest period of the year for most of the country, whereas January and December are very dry. Fluctuating rainfall makes it difficult to predict heavy precipitation that could lead to crop loss or damage. Since environmental data, meteorology, climatology and pollution studies are often rightskewed, the gamma distribution is commonly used to model these data (Piao & Zhi-Sheng, 2015;Pradhan & Kundu, 2011;Son & Oh, 2006;Wang et al., 2019). Many researchers have developed confidence intervals for the parameters of a gamma distribution by using various methods. For example, Krishnamoorthy & León-Novelo (2014) proposed the parametric bootstrap (PB) confidence interval for the mean of a gamma distribution that performed satisfactorily even for small samples. Sangnawakij, Niwitpong & Niwitpong (2015) proposed the method of variance estimates recovery (MOVER) and score and Wald intervals to construct confidence intervals for the ratio of the coefficients of variation (CVs) of gamma distributions that performed better than classical estimators in terms of the expected length. Krishnamoorthy & Wang (2016) developed approximate fiducial quantities (FQs) for constructing the confidence interval for the mean of a gamma distribution that performed satisfactorily when the shape parameter was around 0.5 or larger. FQs can be used to establish approximate solutions to many statistical problems and can be readily applied to handle both uncensored and censored samples. Wang et al. (2019) proposed FQs for the differences between the shape parameters, scale parameters, and means of two independent gamma distributions and found that the performances of the FQ-based confidence intervals were more accurate than other comparable methods.
Rainfall data often contain zero observations at certain times of the year and so this must be taken into account when studying precipitation in Thailand. Aitchison (1955) investigated situations where data contain zero observations with the probability of 0¡ δ¡1 while the positive observations have a residual probability of 1-δ. Aitchison & Brown (1963) introduced the delta-lognormal distribution (a lognormal distribution with an excess of zero observations) for which the number of zero observations comprises a random variable with a binomial distribution and the positive observations comprise a random variable from a lognormal distribution. Many researchers have developed methods to construct confidence intervals for the parameters of a delta-lognormal distribution by using various methods. For example, Yosboonruang, Niwitpong & Niwitpong (2019) proposed new confidence intervals for the CV of a delta-lognormal distribution by using Bayesian methods based on the independent Jeffreys', Jeffreys' rule, or uniform prior and compared them with the fiducial generalized confidence interval (FGCI); the Bayesian confidence interval based on the independent Jeffreys' prior performed better than the other methods in all situations studied. Maneerat & Niwitpong (2021) proposed confidence intervals for the common mean of several delta-lognormal distributions based on FGCI, the large-sample (LS) approach, MOVER, PB, and highest posterior density intervals (HPD) based on the Jeffreys' rule (HPD-JR) or normal-gamma-beta (HPD-NGB) prior; those based on MOVER and PB outperformed the others in a variety of situations. Several researchers have examined methods for constructing confidence intervals for a gamma distribution with excess zeros. Ren, Liu & Pu (2021) proposed simultaneous confidence intervals for the difference between the means of multiple zero-inflated gamma distributions by using three fiducial methods and applied them to precipitation data. Muralidharan & Kale (2002) defined a modified gamma distribution with a singularity at zero and produced confidence intervals for the mean of a mixed distribution. Lecomte et al. (2013) provided compound Poisson-gamma and delta-gamma distributions to handle zero-inflated continuous data under variable sampling volume. Kaewprasert, Niwitpong & Niwitpong (2022) proposed Bayesian estimation for the mean of delta-gamma distributions with application to rainfall data in Thailand.
In statistics, the variance, which gives a measure of the spread or variability of a distribution, is the second central moment, and the positive square root of the variance is the standard deviation (Casella & Berger, 2001). It is one of the most popular parameters of interest for probability and statistical inference.
We are interested to study the confidence interval for the variance of gamma distribution because it is commonly used to model environmental data such as a rainfall dispersion. Rainfall dispersion data can help to examine rainfall intensity, which may be high or low risk. We have studied many research related to constructing the confidence interval for rainfall data, such as Yosboonruang, Niwitpong & Niwitpong (2019) and Maneerat & Niwitpong (2021). We have found several interesting priors, including: Jeffreys', uniform, or normal-gamma-beta prior. Therefore, we applied to this study.
Since no publications have yet been forthcoming on constructing confidence intervals for the variance of a gamma distribution with excess zeros, the objective of the present study is to construct the confidence interval for the variance of a gamma distribution with excess zeros based on FQ, PB, and six Bayesian-based methods: three Bayesian confidence intervals based on the Jeffreys' (BAY-J), uniform (BAY-U), or normal-gamma-beta (BAY-NGB) prior and three highest posterior density intervals based on the Jeffreys' (HPD-J), uniform (HPD-U), or normal-gamma-beta (HPD-NGB) prior.

METHODS
Let X i be a random variable following gamma (α,β) distribution with shape parameter α and scale parameter β. The probability density function can be derived as follows (1) Suppose that the population of interest contains both zero and non-zero observations; the zero observations follow a binomial distribution while the non-zero observations follow a gamma distribution. The numbers of zero and non-zero observations are defined as n (0) and n (1) respectively, where n = n (0) + n (1) . Let X = (X 1 ,X 2 ,...,X n ) be a random sample from a gamma distribution with excess zeros denoted as (δ,α,β). The distribution function for the confidence interval can be derived as where F (x;α,β) is the gamma cumulative distribution function. The maximum likelihood estimator of δ is δ = n (0) /n. The population mean and variance of X are respectively given by The approches used to construct the confidence intervals are in the following subsections. Krishnamoorthy, Mathew & Mukherjee (2008) suggested that a gamma distribution can be approximated by applying the cubic transformation of a Gaussian distribution. Let

The FQ confidence interval
..,n then X i are approximately normally distributed with mean µand variance σ 2 respectively given by where shape parameter a and scale parameter b. The FQs for µand σ 2 are, respectively, wherex and s are the observed values ofX and S, respectively; Z and χ 2 n−1 represent independent random variable of standard normal and chi-squared distribution, respectively; and n is the sample size. The FQs for the parameters of a gamma distribution can thus be derived as Krishnamoorthy & Wang (2016) proposed the FQs for the mean of gamma distribution as follows: where Q µ and Q σ 2 are defined in Eq. (6). Li, Zhou & Tian (2013) proposed the FQ for δ as We can express the FQ for the variance as follows: If V = ab 2 , then we can write Eq. (5) as By solving the above equations for V , we obtain V = ((µ+ µ 2 + 4σ 2 )/(2(9 −1/4 )(σ 2 ) −1/4 )) 4 . Thus, the FQ for gamma variance can be obtained as where Q µ and Q σ 2 are defined in Eq. (6). Thus, the FQ for τ is in the form Therefore, the 100(1 − α)% confidence interval for τ is where Q τ (α/2) and Q τ (1 − α/2) are the (α/2)-th and (1 − α/2)-th percentiles of Q τ , respectively. The confidence intervals for τ can be obtained by using Algorithm 1.

Algorithm 1 FQ
1: Generate x from a gamma distribution with excess zeros, computex, and s 2 of the cube root transformed sample.

The PB confidence interval
The log-likelihood function for the vector of shape α and scale β parameters in gamma distribution is given by Saulo et al. (2018).
Then, the maximum likelihood estimators (MLE) of α and β can be derived as The PB for variance of gamma distribution with excess zeros can be written as The 100(1 − α)% confidence interval for τ is
HPD intervals are constructed from the posterior distribution based on the Bayesian approach. The HPD consists of the values of the parameter for which the posterior density is highest (Casella & Berger, 2001), while the HPD interval is the narrowest possible interval for the parameter of interest at probability 100(1−α)% (Maneerat, Niwitpong & Niwitpong, 2020).
In this section, the Bayesian confidence interval is constructed upon the Jeffreys' priors, uniform priors and normal-gamma-beta prior.
We compute the mean and variance of gamma by using µ jef |σ 2 ,x and σ 2 jef |x as follows: So that The confidence interval and HPD interval of τ based on the Jeffreys' prior are obtained as
The uniform prior for σ 2 is σ 2 ∝ 1 (Kalkur & Rao, 2017). Subsequently, the marginal posterior distribution of σ 2 becomes The marginal posterior distribution of µas We compute the mean and variance of a gamma distribution using µ unif |σ 2 ,x and σ 2 unif |x as follows: So that The confidence interval and HPD interval of τ based on the uniform prior are respectively obtained as Maneerat & Niwitpong (2021) defined the normal-gamma-beta prior as

The BAY-NGB and HPD-NGB intervals
where λ = σ −2 , (µ,λ) follows a normal-gamma distribution and δ follows a beta distribution (Maneerat & Niwitpong, 2021). Thus, the marginal posterior distributions of δ, σ 2 and µrespectively become We compute the mean and variance of a gamma distribution by using µ NGB |x and σ 2 NGB |x as follows: So that The confidence interval and HPD interval of τ based on the normal-gamma-beta prior are respectively obtained as Algorithm 3 Bayesian interval 1: Generate x from a gamma distribution with excess zeros, compute δ, µ, and σ 2 . 2: Generate δ|x from Eqs. (20), (27)

SIMULATION STUDIES AND RESULTS
A Monte Carlo simulation study with 10,000 replications (M) and 5,000 repetitions (m) for FQ and PB, was conducted at a nominal confidence level of 0.95. We set sample size n as 30, 50, 100 or 200 and probability of zeros δ as 0.2, 0.5 or 0.8, for which we set shape parameter α as 7.00, 7.50 or 7.75; 2.00, 2.50 or 2.75; and 1.25, 1.50 or 1.75, respectively. We set rate parameter β as 1 for all cases. The performances of the confidence intervals were assessed by comparing their coverage probabilities (CPs) and average lengths (ALs); the best-performing confidence interval for a particular situation was identified as having a CP close or greater than 0.95 and the shortest AL. The confidence intervals for the variance of gamma distribution with excess zeros constructed using FQ, PB, BAY-J, HPD-J, BAY-U, HPD-U, BAY-NGB and HPD-NGB. We report the coverage probabilities and the average lengths of nominal 95% two-sided confidence intervals for variance of gamma distribution with excess zeros in Table 1 and Figs. 1, 2 and 3.
The CPs of the PB, FQ, HPD-U, BAY-NGB, and HPD-NGB confidence intervals were greater than or close to the nominal confidence level of 0.95 in all situations studied. For a small-to-moderate sample size, FQ and the HPD-U performed well for small δ whereas BAY-NGB and HPD-NGB performed well for large δ. For a large sample size, FQ performed well for small δ whereas BAY-NGB performed well for large δ. Although the expected lengths of the HPD-J were shorter than the other methods, the CPs of BAY-J and HPD-J were lower than the nominal confidence level in all cases.
The findings show that although FQ, HPD-U, BAY-NGB, and HPD-NGB attained acceptable CPs, the ALs of BAY-NGB and the HPD-NGB were shorter than the other methods, and so they can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros. It can be seen that for HPD-NGB developed from the study of Maneerat & Niwitpong (2021), the simulation results are similar to these studies. For small-to-large sample size, HPD-NGB performed well. BAY-NGB and HPD-NGB are the best because BAY-NGB and HPD-NGB attained stable CPs and ALs were shorter than the other methods for all sample sizes. A referee suggested to check the validity and robustness of the model for smaller sample sizes with moderate number of zeros. We, therefore, simulated a study with 10,000 replications (M) and 5,000 repetitions (m) for FQ and PB, was conducted at a nominal confidence level of 0.95. We set sample size n as 10 or 20 and probability of zeros δ as 0.2, or 0.5, for which we set shape parameter α as 7.00, 7.50 or 7.75; and 2.00, 2.50 or 2.75, respectively. We set rate parameter β as one for all cases. The results (not shown here) show that the CPs of the FQ, HPD-U, BAY-NGB, and HPD-NGB confidence intervals were greater than or close to the nominal confidence level of 0.95 in all situations studied. The findings show that although FQ, HPD-U, BAY-NGB, and HPD-NGB attained acceptable CPs, the ALs of HPD-NGB were shorter than the other methods. Although the sample sizes are small (n = 10, n = 20), our findings show that BAY-NGB and HPD-NGB can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros.

EMPIRICAL APPLICATION OF THE PROPOSED CONFIDENCE INTERVALS
The confidence interval performances were compared by using real-world datasets comprising monthly rainfall data reported by the Upper Northern Region Irrigation Hydrology for January and February 1993 to 2021 at the Kiew Lom Dam, Lampang province, Thailand. First, the best fit for the positive rainfall data among normal, lognormal, Cauchy, and gamma models was examined by calculating their Akaike information criterion (AIC) and Bayesian information criterion (BIC) values ( Table 2). The results show that the lowest AIC and BIC values (207.7139 and 210.2301, respectively) were for the gamma distribution, indicating that it was the best fit for the data.
The summary statistics for the rainfall data in Kiew Lom Dam Lampang province arē x = 18.6461, n = 58 ,n (1) = 26,n (0) = 32, while the maximum likelihood estimators for   δ,α,β and τ areδ = 0.5517,α = 0.7297,β = 0.0391 andτ = 299.5542, respectively. The calculated two-sided confidence intervals for τ are reported in Table 3. For n = 50 and δ = 0.5, FQ and BAY-NGB obtained CPs close to the nominal confidence level of 0.95, but BAY-NGB bay obtained the shortest length method. Thus, the BAY-NGB method is recommended for constructing the confidence interval for the variance in rainfall data in January and February at the Kiew Lom Dam in Lampang province.

CONCLUSIONS
We constructed confidence intervals for the variance of a gamma distribution with excess zeros by using the PB, FQ, BAY-J, HPD-J, BAY-U, HPD-U, BAY-NGB, and HPD-NGB approaches. The CPs and ALs of the methods were assessed by Monte Carlo simulation for various situations and by using real precipitation data following a gamma distribution with excess zeros. Our findings show that BAY-NGB and HPD-NGB can be recommended for constructing the confidence interval for the variance of a gamma distribution with excess zeros. In future research, we will investigate constructing confidence intervals for the difference between the variances of gamma distributions with excess zeros.